You’ve been contracted as a GIS technician for the non-profit organization “Street Trees for Wildlife”! The non-profit will be conducting a study in which they will be sampling caterpillars on street trees in the District of Columbia (i.e., Washington DC).
Your goal is to provide GIS support for the organization and use GIS to guide a field team to the locations where they will collect their samples. Specifically, in this project, you will follow a series of steps to generate classified sampling locations that are within 1 kilometer of commuter train stations in Washington DC.
This project will evaluate a number of GIS-in-R skills, including (but not limited to!) your ability to:
Total points available: 30
You will hand in a single R Markdown file for this assignment. This file must include all of the code used (provided in code chunks) in answering each question.
Your response to each “memo” is worth 10 points. You will be graded on your 10 highest scoring memos (feel free to skip some memos if you are confident in your answers!). If you get stuck, or have chosen to skip one of the memos, use one of the lifeline files to assist you.
Points may be deducted from each question’s total:
Note: The maximum deduction is the total points value for a given question
As always, you must ensure that your R Markdown document runs out-of-the-box – in other words, the document will knit without error. If it does not, you will receive a 20% deduction on the total grade for the assignment. Some tips for doing so:
source() or read_csv())setwd() in your code
(Never use setwd()!)Click the blue button below to view the functions that I used in completing this problem set. Although you are not required to use the same functions as I did, you might find this list helpful!
::<-=+!>>=$~()as.numericas.matrixcis.nalevelslibrarylist.fileslist2envrmsumuniquearrangebind_colscase_whenif_elsefilterleft_joinmutatenpullselectslice_maxsummarizefct_collapse+aesgeom_sfggplotscale_fill_viridis_ctheme_voidkable%>%mapread_csvset_namesst_areast_as_sfst_bufferst_castst_coordinatesst_filterst_intersectionst_joinst_readst_transformsetNamesstr_detectaggregateclassifydistanceextractmaskrastrasterizeas_tibbledrop_napivot_widertm_basemaptm_dotstm_rastertm_shapetmap_modeNote: The packages dplyr, forcats, ggplot2, magrittr, purrr,
readr, rlang, stringr, tibble, and tidyr are all part of the tidyverse
and are loaded with library(tidyverse).
Welcome to the team GIS Guru!
Please start by saving this R Markdown file with the naming convention “final_project_[your last name]_[your first name].Rmd” and modify the YAML header by providing your name and the date.
Sincerely,
Your new boss!
Here you go boss! I have created an R Markdown file using your chosen naming convention and added my name and date to the YAML header.
Hello GIS Guru,
We’re super into the sf, tidyverse, and tmap packages around here. Especially the tidyverse – I’ve got a life-sized poster of Hadley in my office. Please make sure to load those packages before proceeding!
Cheers,
Your new boss!
P.S. I’m also a big fan of interactive maps, so please set your tmap mode to “view”!
That was an easy one boss, here’s how I loaded the packages:
library(sf)
## Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
library(tmap)
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
## (status 2 uses the sf package in place of rgdal)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
I also set the tmap mode to view for you – here’s how I did it:
tmap_mode("view")
## tmap mode set to interactive viewing
Dear GIS Guru,
Let’s really get things started by reading in two files from the
folder data/final_project_data – nlcd_key.csv
and lc_temp.
As you are probably suspecting, the file nlcd_key is a
key to the dataset classified_lc.tif. It is a tabular
dataset with the following fields:
id: A unique identifier for land cover classesname: A unique name defining each land cover classcolor_value: Colors typically used to visualize land
cover classesdescription: A long description of each classThe file classified_lc.tif is a GeoTIFF raster obtained
from the
Multi-Resolution
Land Characteristics Consortium (MRLC). This is a categorical raster
that describes the land cover within the spatial extent of the District
of Columbia. The value of each cell represents the dominant land cover
class (link) within that cell. The resolution of the raster is
approximately 30 x 30 m and the CRS is EPSG 32618 (datum = World
Geodetic System 1984 (WGS 84); projected UTM Zone 18N).
Please:
Read in the raster file nlcd_key.csv and globally
assign the name nlcd_key to the resultant file.
Read in the raster file classified_lc.tif and
globally assign the name lc_temp to the resultant
file.
Warm regards,
Your “meta” boss
I read in nlcd_key.csv and assigned the name
nlcd_key to my global environment. Here’s the code I used
to do it!
nlcd_key <-
read_csv("data/final_project_data/nlcd_key.csv")
## Rows: 20 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): name, color_value, description
## dbl (1): id
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
I used the terra package to read in the raster
classified_lc.tif and assigned the name
lc_temp to my global environment. Here’s how I did it:
lc_temp <-
terra::rast("data/final_project_data/classified_lc.tif")
Bonjour GIS Guru!
I have a number of shapefiles that I would like you to use for this project. Each shapefile was obtained from Open Data DC and collected/stored using the unprojected coordinate reference system EPSG 4326. The files include:
metro_stops.geojson: A point shapefile that describes
the location of commuter train stations in Washington DC.national_parks.geojson: A multipolygon shapefile where
each shape represents land in Washington DC that is owned and operated
by the US National Park service.streets.geojson: A linestring shapefile of roads in
Washington DC (Note: This represents a simplified representation of
the original version from Open Data DC).wards: A polygon shapefile that describes political
regions (e.g., local voting areas) in Washington DC.Please use iteration to read in all of the geojson files in at once. In doing so:
NAME (but maintain the
geometry).NAME, to lower case.Au revoir,
Your sophisticated boss
Hi chef, I used iteration to read in, subset, and transform the shapefiles as you asked! Here is how I did it:
list.files("data/final_project_data",
pattern = 'geojson',
full.names = TRUE) %>%
purrr::map(
~ st_read(.x) %>%
st_transform(crs = 32618) %>%
select(name = NAME)) %>%
set_names("metro_stops",
"national_parks",
"streets",
"wards") %>%
list2env(envir = .GlobalEnv)
## Reading layer `Metro_Stations_in_DC' from data source
## `C:\Users\User\Documents\final_project\data\final_project_data\Metro_Stations_in_DC.geojson'
## using driver `GeoJSON'
## Simple feature collection with 40 features and 13 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -77.085 ymin: 38.84567 xmax: -76.93526 ymax: 38.97609
## Geodetic CRS: WGS 84
## Reading layer `National_Parks_DC' from data source
## `C:\Users\User\Documents\final_project\data\final_project_data\National_Parks_DC.geojson'
## using driver `GeoJSON'
## Simple feature collection with 457 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 316307.5 ymin: 4296878 xmax: 333240.8 ymax: 4318263
## Projected CRS: WGS 84 / UTM zone 18N
## Reading layer `dc_streets_simple' from data source
## `C:\Users\User\Documents\final_project\data\final_project_data\streets_simple.geojson'
## using driver `GeoJSON'
## Simple feature collection with 34021 features and 2 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: -77.11664 ymin: 38.79323 xmax: -76.90953 ymax: 38.99526
## Geodetic CRS: WGS 84
## Reading layer `Wards_from_2022' from data source
## `C:\Users\User\Documents\final_project\data\final_project_data\Wards_from_2022.geojson'
## using driver `GeoJSON'
## Simple feature collection with 8 features and 325 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -77.1198 ymin: 38.79164 xmax: -76.90915 ymax: 38.99597
## Geodetic CRS: WGS 84
## <environment: R_GlobalEnv>
Hiya GIS Guru,
I’ve got a big processing step for you. The file
Urban_Forestry_Street_Trees.csv is a pretty gigantic
tabular dataset that I downloaded from
Open Data DC.
Each record in the data represents a street tree in the District of
Columbia. There are a lot of columns in the dataset, but we are only
interested in:
GENUS_NAME: The genus of the street treesDBH: The diameter of the street tree trunks, in
inchesX: The longitudinal coordinates of the street trees,
recorded in EPSG 4326Y: The latitudinal coordinates of the street trees,
recorded in EPSG 4326Please:
Urban_Forestry_Street_Trees.csvGENUS_NAME, DBH,
X, and Y (in that order)genus, dbh,
longitude, and latitude (respectively)dbh is greater than 12
inches and then remove the field dbhwards
shapefiletrees_temp to the resultant
objectTa ta,
Your oddly data-aware boss
Hi there boss, that was a lot of steps! Here is how I generated the
trees_temp point shapefile following the steps that you
outlined above:
trees_temp <-
read_csv("data/final_project_data/Urban_Forestry_Street_Trees.csv") %>%
select(
genus = GENUS_NAME,
dbh = DBH,
longitude = X,
latitude = Y) %>%
drop_na() %>%
filter(
!str_detect(genus, pattern = "^No"),
dbh > 12) %>%
select(!dbh) %>%
st_as_sf(coords = c('longitude', 'latitude'),
crs = 4326) %>%
st_transform(crs = 32618) %>%
st_filter(wards)
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 208040 Columns: 54
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (29): SCI_NM, CMMN_NM, GENUS_NAME, FAM_NAME, DATE_PLANT, FACILITYID, VIC...
## dbl (17): X, Y, WARD, TBOX_L, TBOX_W, DBH, MBG_WIDTH, MBG_LENGTH, MBG_ORIENT...
## lgl (8): SPECIALPHOTO, PHOTOREMARKS, GIS_ID, CREATOR, CREATED, EDITOR, EDIT...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Top of the morning to you GIS Guru!
Our team is primarily interested in oaks (Quercus) and maples (Acer).
Reclassify the variable genus such that “Quercus” is
classified as “oak”, “Acer” is classified as “maple”, and all other taxa
are classified as “other”. Let’s save this one for later use – globally
assign the name trees_classified to the resultant
object.
Slán leat,
Your gaffer
P.S. We are not going to use trees_temp again, so please
remove it from your global environment.
Hello gaffer, I reclassified the genus field as you
requested and globally assigned the name trees_classified
to the resultant point shapefile. Here’s how I did it:
trees_classified <-
trees_temp %>%
mutate(genus =
case_when(genus == "Quercus" ~ "oak",
genus == "Acer" ~ "maple",
TRUE ~ "other"))
I also removed trees_temp from the global
environment:
rm(trees_temp)
How dost thou fare GIS Guru?
The “wards” (District of Columbia voting areas) are keenly interested in our study! They want to know which ward has the most trees. Generate a kable table where:
Fare-thee-well,
Your old-fashioned boss
Good morrow, boss! Here is a kable table that shows the number of oaks, maples, and total trees by ward (and the code I used to create it):
trees_classified %>%
st_join(wards) %>%
as_tibble() %>%
summarize(n = n(),
.by = c(genus, name)) %>%
pivot_wider(names_from = genus,
values_from = n) %>%
mutate(total = other + maple + oak) %>%
select(!other) %>%
arrange(name) %>%
knitr::kable()
| name | maple | oak | total |
|---|---|---|---|
| Ward 1 | 412 | 1281 | 3038 |
| Ward 2 | 643 | 1680 | 4792 |
| Ward 3 | 2618 | 4940 | 12162 |
| Ward 4 | 1451 | 4288 | 9328 |
| Ward 5 | 1095 | 2912 | 6725 |
| Ward 6 | 1205 | 2011 | 5494 |
| Ward 7 | 1904 | 2572 | 7399 |
| Ward 8 | 711 | 1726 | 4278 |
G’day GIS Guru,
The “battle of the wards” is heating up! Some of the smaller wards thought our analysis wasn’t fair. Let’s give them a map of tree density by ward. Please:
Hooroo,
Your ggplotin’ boss
Hi Boss, this ggplot map of total tree density by ward should appease those smaller wards:
trees_classified %>%
st_join(wards) %>%
as_tibble() %>%
summarize(n = n(),
.by = c(genus, name)) %>%
pivot_wider(names_from = genus,
values_from = n) %>%
mutate(total = other + maple + oak) %>%
select(!other) %>%
arrange(name) %>%
left_join(wards, .) %>%
mutate(
area =
st_area(.) %>%
units::set_units("km^2"),
density = total / area %>%
as.numeric()) %>%
ggplot() +
geom_sf(aes(fill = density)) +
scale_fill_viridis_c(direction = -1,
na.value = "#dcdcdc")
## Joining with `by = join_by(name)`
Howdy GIS Guru,
So far, so good pardner! Unfortunately, we have not gotten permission to sample on land managed by the National Park Service! Please:
trees_classified to trees that are NOT within
the shapefile national_parks and globally assign the name
trees_no_nps to the resultant object. Hint: Unless you
like waiting and/or overwhelming your system’s memory, avoid using
st_difference() for this operation!national_parks and trees_classified
from your global environment.genus field
and the and the points are clustered.Adios,
Your rootin’ tootin’ boss
Here is the code that I used to create the trees_no_nps
point shapefile:
trees_no_nps <-
trees_classified %>%
st_filter(
st_union(national_parks),
.predicate = st_disjoint)
I then removed the names national_parks and
trees_classified from my global environment:
rm(national_parks, trees_classified)
… and created this interactive map of street trees in the District:
tm_basemap(c("OpenStreetMap",
"Esri.WorldImagery")) +
tm_shape(trees_no_nps,
name = "Trees") +
tm_dots('genus',
clustering = TRUE)